A Hybrid Model for Sense Guessing of Chinese Unknown Words

نویسندگان

  • Likun Qiu
  • Kai Zhao
  • Changjian Hu
چکیده

This paper proposes a hybrid model to address the task of sense guessing for Chinese unknown words. Three types of similarity, i.e., positional, syntactic and semantic similarity, are analyzed; and three models are developed accordingly. Then the three models are combined to form a hybrid one (HPPS Model). To verify the effectiveness and consistency of HPPS, experiments were conducted on ten test sets which were collected from two popular Chinese thesauruses Cilin and HowNet. In addition, extra experiments were made on a test set of 2000 words which were collected from newspaper. The experiments show that HPPS Model consistently produces 4%~6% F-score improvement over the best results reported in previous researches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid Methods for POS Guessing of Chinese Unknown Words

This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a par...

متن کامل

Chinese POS Disambiguation and Unknown Word Guessing with Lexicalized HMMs

This article presents a lexicalized HMM-based approach to Chinese part-of-speech (POS) disambiguation and unknown word guessing (UWG). In order to explore word-internal morphological features for Chinese POS tagging, four types of pattern tags are defined to indicate the way lexicon words are used in a segmented sentence. Such patterns are combined further with POS tags. Thus, Chinese POS disam...

متن کامل

c○2005 The Association for Computational Linguistics

This paper describes a hybrid model that combines a rule-based model with two statistical models for the task of POS guessing of Chinese unknown words. The rule-based model is sensitive to the type, length, and internal structure of unknown words, and the two statistical models utilize contextual information and the likelihood for a character to appear in a particular position of words of a par...

متن کامل

A Method for Automatic POS Guessing of Chinese Unknown Words

This paper proposes a method for automatic POS (part-of-speech) guessing of Chinese unknown words. It contains two models. The first model uses a machinelearning method to predict the POS of unknown words based on their internal component features. The credibility of the results of the first model is then measured. For low-credibility words, the second model is used to revise the first model’s ...

متن کامل

Hybrid Models for Chinese Unknown Word Resolution Dissertation

Word segmentation, part-of-speech (POS) tagging, and sense tagging are important steps in various Chinese natural language processing (CNLP) systems. Unknown words, i.e., words that are not in the dictionary or training data used in a CNLP system, constitute a major challenge for each of these steps. This dissertation is concerned with developing hybrid models that effectively combine statistic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009